node model
Can message-passing GNN approximate triangular factorizations of sparse matrices?
Trifonov, Vladislav, Muravleva, Ekaterina, Oseledets, Ivan
Specifically, we show that there exist classes of Networks (GNNs) for learning sparse matrix matrices, starting from simple ones such as tridiagonal matrices preconditioners. While recent works have shown arising from discretization of PDEs, where optimal promising results using GNNs to predict incomplete sparse preconditioners exist but exhibit non-local dependencies factorizations, we demonstrate that the local - changing a single entry in A can significantly nature of message passing creates inherent barriers affect all entries in L. This means, that message passing for capturing non-local dependencies required GNNs, having limited receptive field, can not represent such for optimal preconditioning. We introduce a new non-local mappings. To address these limitations, we introduce benchmark dataset of matrices where good sparse a new benchmark dataset of matrices where optimal preconditioners exist but require non-local computations, sparse preconditioners are known to exist but require nonlocal constructed using both synthetic examples computations. We construct this dataset using both and real-world matrices. Our experimental results synthetic examples and real-world matrices from the SuiteSparse show that current GNN architectures struggle to collection. For synthetic benchmarks, we carefully design approximate these preconditioners, suggesting the tridiagonal matrices where the Cholesky factors depend need for new architectural approaches beyond traditional non-locally on the matrix elements by leveraging properties message passing networks. We provide of rank-1 semiseparable matrices. For real-world problems, theoretical analysis and empirical evidence to explain we explicitly compute so-called K-optimal preconditioners these limitations, with implications for the based on the inverse matrix with sparsity patterns matching broader use of GNNs in numerical linear algebra.
FoLDTree: A ULDA-Based Decision Tree Framework for Efficient Oblique Splits and Feature Selection
Traditional decision trees are limited by axis-orthogonal splits, which can perform poorly when true decision boundaries are oblique. While oblique decision tree methods address this limitation, they often face high computational costs, difficulties with multi-class classification, and a lack of effective feature selection. In this paper, we introduce LDATree and FoLDTree, two novel frameworks that integrate Uncorrelated Linear Discriminant Analysis (ULDA) and Forward ULDA into a decision tree structure. These methods enable efficient oblique splits, handle missing values, support feature selection, and provide both class labels and probabilities as model outputs. Through evaluations on simulated and real-world datasets, LDATree and FoLDTree consistently outperform axis-orthogonal and other oblique decision tree methods, achieving accuracy levels comparable to the random forest.
Machine-Learned Closure of URANS for Stably Stratified Turbulence: Connecting Physical Timescales & Data Hyperparameters of Deep Time-Series Models
Meena, Muralikrishnan Gopalakrishnan, Liousas, Demetri, Simin, Andrew D., Kashi, Aditya, Brewer, Wesley H., Riley, James J., Kops, Stephen M. de Bruyn
We develop time-series machine learning (ML) methods for closure modeling of the Unsteady Reynolds Averaged Navier Stokes (URANS) equations applied to stably stratified turbulence (SST). SST is strongly affected by fine balances between forces and becomes more anisotropic in time for decaying cases. Moreover, there is a limited understanding of the physical phenomena described by some of the terms in the URANS equations. Rather than attempting to model each term separately, it is attractive to explore the capability of machine learning to model groups of terms, i.e., to directly model the force balances. We consider decaying SST which are homogeneous and stably stratified by a uniform density gradient, enabling dimensionality reduction. We consider two time-series ML models: Long Short-Term Memory (LSTM) and Neural Ordinary Differential Equation (NODE). Both models perform accurately and are numerically stable in a posteriori tests. Furthermore, we explore the data requirements of the ML models by extracting physically relevant timescales of the complex system. We find that the ratio of the timescales of the minimum information required by the ML models to accurately capture the dynamics of the SST corresponds to the Reynolds number of the flow. The current framework provides the backbone to explore the capability of such models to capture the dynamics of higher-dimensional complex SST flows.
Fourier Neural Differential Equations for learning Quantum Field Theories
Brant, Isaac, Norcliffe, Alexander, Liรฒ, Pietro
By multiplying the representations of the interaction vertices, propagators, and particle lines known as Feynman rules [1], particle scattering amplitudes are derived from the interaction Hamiltonian. This is an example of a phenomenological connection between what is theoretically derived and what is empirically observed [2]. Neural Networks have been used to learn physical problems, including Hamiltonians [3], higher-order behaviour [4], and Fourier representations [5], where physical constraints are applied to network architecture, improving convergence and explainability. Neural Differential Equations (NDEs) [6] take the continuous time limit of a Residual Neural Network (RNN) to learn differential equations. Integrating the learnt function through time outputs the network's hidden state to a continuous depth. NDEs have so far been applied to various quantum systems [7] [8] [9] [10], but not yet to scattering processes in Quantum Field Theory (QFT). In this paper, we look for how NDEs can be used to bridge the phenomenological connection between experiment and theory by training these models on particle scattering data to learn scalar quantum field theories. The objectives are twofold: apply Neural Ordinary Differential Equations (NODE) to learn particle scattering.
Autonomous Drifting with 3 Minutes of Data via Learned Tire Models
Djeumou, Franck, Goh, Jonathan Y. M., Topcu, Ufuk, Balachandran, Avinash
Near the limits of adhesion, the forces generated by a tire are nonlinear and intricately coupled. Efficient and accurate modelling in this region could improve safety, especially in emergency situations where high forces are required. To this end, we propose a novel family of tire force models based on neural ordinary differential equations and a neural-ExpTanh parameterization. These models are designed to satisfy physically insightful assumptions while also having sufficient fidelity to capture higher-order effects directly from vehicle state measurements. They are used as drop-in replacements for an analytical brush tire model in an existing nonlinear model predictive control framework. Experiments with a customized Toyota Supra show that scarce amounts of driving data -- less than three minutes -- is sufficient to achieve high-performance autonomous drifting on various trajectories with speeds up to 45mph. Comparisons with the benchmark model show a $4 \times$ improvement in tracking performance, smoother control inputs, and faster and more consistent computation time.
FedER: Federated Learning through Experience Replay and Privacy-Preserving Data Synthesis
Pennisi, Matteo, Salanitri, Federica Proietto, Bellitto, Giovanni, Casella, Bruno, Aldinucci, Marco, Palazzo, Simone, Spampinato, Concetto
In the medical field, multi-center collaborations are often sought to yield more generalizable findings by leveraging the heterogeneity of patient and clinical data. However, recent privacy regulations hinder the possibility to share data, and consequently, to come up with machine learning-based solutions that support diagnosis and prognosis. Federated learning (FL) aims at sidestepping this limitation by bringing AI-based solutions to data owners and only sharing local AI models, or parts thereof, that need then to be aggregated. However, most of the existing federated learning solutions are still at their infancy and show several shortcomings, from the lack of a reliable and effective aggregation scheme able to retain the knowledge learned locally to weak privacy preservation as real data may be reconstructed from model updates. Furthermore, the majority of these approaches, especially those dealing with medical data, relies on a centralized distributed learning strategy that poses robustness, scalability and trust issues. In this paper we present a federated and decentralized learning strategy, FedER, that, exploiting experience replay and generative adversarial concepts, effectively integrates features from local nodes, providing models able to generalize across multiple datasets while maintaining privacy. FedER is tested on two tasks -- tuberculosis and melanoma classification -- using multiple datasets in order to simulate realistic non-i.i.d. medical data scenarios. Results show that our approach achieves performance comparable to standard (non-federated) learning and significantly outperforms state-of-the-art federated methods in their centralized (thus, more favourable) formulation. Code is available at https://github.com/perceivelab/FedER
Deep learning delay coordinate dynamics for chaotic attractors from partial observable data
Young, Charles D., Graham, Michael D.
A common problem in time series analysis is to predict dynamics with only scalar or partial observations of the underlying dynamical system. For data on a smooth compact manifold, Takens theorem proves a time delayed embedding of the partial state is diffeomorphic to the attractor, although for chaotic and highly nonlinear systems learning these delay coordinate mappings is challenging. We utilize deep artificial neural networks (ANNs) to learn discrete discrete time maps and continuous time flows of the partial state. Given training data for the full state, we also learn a reconstruction map. Thus, predictions of a time series can be made from the current state and several previous observations with embedding parameters determined from time series analysis. The state space for time evolution is of comparable dimension to reduced order manifold models. These are advantages over recurrent neural network models, which require a high dimensional internal state or additional memory terms and hyperparameters. We demonstrate the capacity of deep ANNs to predict chaotic behavior from a scalar observation on a manifold of dimension three via the Lorenz system. We also consider multivariate observations on the Kuramoto-Sivashinsky equation, where the observation dimension required for accurately reproducing dynamics increases with the manifold dimension via the spatial extent of the system.
Optimizing differential equations to fit data and predict outcomes
Many scientific problems focus on observed patterns of change or on how to design a system to achieve particular dynamics. Those problems often require fitting differential equation models to target trajectories. Fitting such models can be difficult because each evaluation of the fit must calculate the distance between the model and target patterns at numerous points along a trajectory. The gradient of the fit with respect to the model parameters can be challenging. Recent technical advances in automatic differentiation through numerical differential equation solvers potentially change the fitting process into a relatively easy problem, opening up new possibilities to study dynamics. However, application of the new tools to real data may fail to achieve a good fit. This article illustrates how to overcome a variety of common challenges, using the classic ecological data for oscillations in hare and lynx populations. Models include simple ordinary differential equations (ODEs) and neural ordinary differential equations (NODEs), which use artificial neural networks to estimate the derivatives of differential equation systems. Comparing the fits obtained with ODEs versus NODEs, representing small and large parameter spaces, and changing the number of variable dimensions provide insight into the geometry of the observed and model trajectories. To analyze the quality of the models for predicting future observations, a Bayesian-inspired preconditioned stochastic gradient Langevin dynamics (pSGLD) calculation of the posterior distribution of predicted model trajectories clarifies the tendency for various models to underfit or overfit the data. Coupling fitted differential equation systems with pSGLD sampling provides a powerful way to study the properties of optimization surfaces, raising an analogy with mutation-selection dynamics on fitness landscapes.
Conflict and Surprise: Heuristics for Model Revision
Any probabilistic model of a problem is based on assumptions which, if violated, invalidate the model. Users of probability based decision aids need to be alerted when cases arise that are not covered by the aid's model. Diagnosis of model failure is also necessary to control dynamic model construction and revision. This paper presents a set of decision theoretically motivated heuristics for diagnosing situations in which a model is likely to provide an inadequate representation of the process being modeled.
Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches
Naseem, T., Snyder, B., Eisenstein, J., Barzilay, R.
We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The central assumption of our work is that by combining cues from multiple languages, the structure of each becomes more apparent. We consider two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second model which instead incorporates multilingual context using latent variables. Both approaches are formulated as hierarchical Bayesian models, using Markov Chain Monte Carlo sampling techniques for inference. Our results demonstrate that by incorporating multilingual evidence we can achieve impressive performance gains across a range of scenarios. We also found that performance improves steadily as the number of available languages increases.